Data Visualization¶
About the data¶
| Column name | Column meaning | Example value |
| raw_row_number | An number used to join clean data back to the raw data | 38299 |
| date_time | The date and time of the stop, in YYYY-MM-DD HH:MM format. | "2017-02-02 20:15" |
| location | The freeform text of the location. Occasionally, this represents the concatenation of several raw fields, i.e. street_number, street_name | "248 Stockton Rd." |
| county_name | County name where provided | "Allegheny County" |
| subject_age | The age of the stopped subject. When date of birth is given, we calculate the age based on the stop date. Values outside the range of 10-110 are coerced to NA. | 54.23 |
| subject_race | The race of the stopped subject. Values are standardized to white, black, hispanic, asian/pacific islander, and other/unknown | "hispanic" |
| subject_sex | The recorded sex of the stopped subject. | "female" |
| officer_id_hash | A unique hash of the officer id used to identify individual officers within a location. This is usually just a hash of the provided officer ID or badge number. | "a888fdc120" |
| department_name | Name of department or subdivision to which officer has been assigned. | "Charlotte-Mecklenburg Police Department" |
| type | Type of stop: vehicular or pedestrian. | "vehicular" |
| arrest_made | Indicates whether an arrest made. | FALSE |
| citation_issued | Indicates whether a citation was issued. | TRUE |
| warning_issued | Indicates whether a warning was issued. | TRUE |
| outcome | The strictest action taken among arrest, citation, warning, and summons. | "citation" |
| frisk_performed | Indicates whether a frisk was performed. This is technically different from a search, but departments will sometimes include frisks as a search type. | TRUE |
| search_conducted | Indicates whether any type of search was conducted, i.e. driver, passenger, vehicle. Frisks are excluded where the department has provided resolution on both. | TRUE |
| search_person | Indicates whether a search of a person has occurred. This is only defined when search_conducted is TRUE. | TRUE |
| search_vehicle | Indicates whether a search of a vehicle has occurred. This is only defined when search_conducted is TRUE. | TRUE |
| search_basis | This provides the reason for the search where provided and is categorized into k9, plain view, consent, probable cause, and other. If a serach occurred but the reason wasn't listed, we assume probable cause. | "consent" |
| reason_for_stop | A freeform text field indicating the reason for the stop where provided. | "EQUIPMENT MALFUNCTION" |
| raw_Ethnicity | the raw data's hispanic/non-hispanic column | ['H', 'N'] |
| raw_Race | the raw data's race column | ['W', 'B', 'A', 'U', 'I'] |
| raw_action_description | the raw data's policy stop outcome | ['Citation Issued', 'Verbal Warning', 'Written Warning','On-View Arrest', 'No Action Taken'] |
Exploratory Data Analysis (EDA)¶
In [11]:
Copied!
from pandas_profiling import ProfileReport
ProfileReport(train)
from pandas_profiling import ProfileReport
ProfileReport(train)
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]
Out[11]:
Model Examination¶
In [12]:
Copied!
#create input dataframe by your own
Models = ["F KNN","F LR ","F DT", "S KNN","S LR ","S DT"]
R2 = [0.635,0.723,0.639,0.565, 0.703, 0.580]
Acc = [0.7,0.8,0.4,0.63,0.77,0.35]
modeln = pd.DataFrame({
"Model": Models,
"R2": R2,
"Accruacy": Acc,
})
modeln.head()
#create input dataframe by your own
Models = ["F KNN","F LR ","F DT", "S KNN","S LR ","S DT"]
R2 = [0.635,0.723,0.639,0.565, 0.703, 0.580]
Acc = [0.7,0.8,0.4,0.63,0.77,0.35]
modeln = pd.DataFrame({
"Model": Models,
"R2": R2,
"Accruacy": Acc,
})
modeln.head()
Out[12]:
| Model | R2 | Accruacy | |
|---|---|---|---|
| 0 | F KNN | 0.635 | 0.70 |
| 1 | F LR | 0.723 | 0.80 |
| 2 | F DT | 0.639 | 0.40 |
| 3 | S KNN | 0.565 | 0.63 |
| 4 | S LR | 0.703 | 0.77 |
In [13]:
Copied!
#modified from source https://www.youtube.com/watch?v=FuJOsZgo4nU
from jupyter_dash import JupyterDash
#-------------------------------------------------------------------------------------
app = JupyterDash(__name__)
#-------------------------------------------------------------------------------------
app.layout = html.Div([
html.Div([
html.Pre(children= "Model Comparison",
style={"text-align": "center", "font-size":"100%", "color":"black"})
]),
html.Div([
html.Label(['X-axis:'],style={'font-weight': 'bold'}),
dcc.RadioItems(
id='xaxis_raditem',
options=[
{'label': 'Models', 'value': 'Model'},
# {'label': 'data', 'value': 'Data'}, # add more x-axis
# {'label': 'data', 'value': 'Data'}, # add more x-axis
# {'label': 'data', 'value': 'Data'}, # add more x-axis
# {'label': 'data', 'value': 'Data'}, # add more x-axis
# {'label': 'data', 'value': 'Data'}, # add more x-axis
# {'label': 'data', 'value': 'Data'}, # add more x-axis
],
value='Model',
style={"width": "50%"}
),
]),
html.Div([
html.Br(),
html.Label(['Y-axis:'], style={'font-weight': 'bold'}),
dcc.RadioItems(
id='yaxis_raditem',
options=[
{'label': 'R2', 'value': 'R2'},
{'label': 'Accuracy', 'value': 'Accruacy'},
# {'label': 'data', 'value': 'Data'}, # add more y-axis
# {'label': 'data', 'value': 'Data'}, # add more y-axis
# {'label': 'data', 'value': 'Data'}, # add more y-axis
# {'label': 'data', 'value': 'Data'}, # add more y-axis
# {'label': 'data', 'value': 'Data'}, # add more y-axis
# {'label': 'data', 'value': 'Data'}, # add more y-axis
],
value='Accruacy',
style={"width": "50%"}
),
]),
html.Div([
dcc.Graph(id='the_graph')
]),
])
#-------------------------------------------------------------------------------------
@app.callback(
Output(component_id='the_graph', component_property='figure'),
[Input(component_id='xaxis_raditem', component_property='value'),
Input(component_id='yaxis_raditem', component_property='value')]
)
def update_graph(x_axis, y_axis):
dff = modeln
# print(dff[[x_axis,y_axis]][:1])
barchart=px.bar(
data_frame=dff,
x=x_axis,
y=y_axis,
title=y_axis+': by '+x_axis,
# facet_col='Borough',
# color='Borough',
# barmode='group',
)
barchart.update_layout(xaxis={'categoryorder':'total ascending'},
title={'xanchor':'center', 'yanchor': 'top', 'y':0.9,'x':0.5,})
return (barchart)
if __name__ == '__main__':
app.run_server(mode='inline')
#modified from source https://www.youtube.com/watch?v=FuJOsZgo4nU
from jupyter_dash import JupyterDash
#-------------------------------------------------------------------------------------
app = JupyterDash(__name__)
#-------------------------------------------------------------------------------------
app.layout = html.Div([
html.Div([
html.Pre(children= "Model Comparison",
style={"text-align": "center", "font-size":"100%", "color":"black"})
]),
html.Div([
html.Label(['X-axis:'],style={'font-weight': 'bold'}),
dcc.RadioItems(
id='xaxis_raditem',
options=[
{'label': 'Models', 'value': 'Model'},
# {'label': 'data', 'value': 'Data'}, # add more x-axis
# {'label': 'data', 'value': 'Data'}, # add more x-axis
# {'label': 'data', 'value': 'Data'}, # add more x-axis
# {'label': 'data', 'value': 'Data'}, # add more x-axis
# {'label': 'data', 'value': 'Data'}, # add more x-axis
# {'label': 'data', 'value': 'Data'}, # add more x-axis
],
value='Model',
style={"width": "50%"}
),
]),
html.Div([
html.Br(),
html.Label(['Y-axis:'], style={'font-weight': 'bold'}),
dcc.RadioItems(
id='yaxis_raditem',
options=[
{'label': 'R2', 'value': 'R2'},
{'label': 'Accuracy', 'value': 'Accruacy'},
# {'label': 'data', 'value': 'Data'}, # add more y-axis
# {'label': 'data', 'value': 'Data'}, # add more y-axis
# {'label': 'data', 'value': 'Data'}, # add more y-axis
# {'label': 'data', 'value': 'Data'}, # add more y-axis
# {'label': 'data', 'value': 'Data'}, # add more y-axis
# {'label': 'data', 'value': 'Data'}, # add more y-axis
],
value='Accruacy',
style={"width": "50%"}
),
]),
html.Div([
dcc.Graph(id='the_graph')
]),
])
#-------------------------------------------------------------------------------------
@app.callback(
Output(component_id='the_graph', component_property='figure'),
[Input(component_id='xaxis_raditem', component_property='value'),
Input(component_id='yaxis_raditem', component_property='value')]
)
def update_graph(x_axis, y_axis):
dff = modeln
# print(dff[[x_axis,y_axis]][:1])
barchart=px.bar(
data_frame=dff,
x=x_axis,
y=y_axis,
title=y_axis+': by '+x_axis,
# facet_col='Borough',
# color='Borough',
# barmode='group',
)
barchart.update_layout(xaxis={'categoryorder':'total ascending'},
title={'xanchor':'center', 'yanchor': 'top', 'y':0.9,'x':0.5,})
return (barchart)
if __name__ == '__main__':
app.run_server(mode='inline')
In [ ]:
Copied!